Method of performing shape localization

ABSTRACT

A method for performing shape localization in an image includes deriving a model shape from a database of a plurality of sample shapes. The model shape is defined by a set of landmarks. The method further includes deriving a texture likelihood model of present sub-patches of the set of landmarks defining the model shape in the image, and proposing a new set of landmarks that approximates a true location of features of the shape based on a sample proposal model of the present sub-patches. A CONDENSATION algorithm is used to derive the texture likelihood model and the proposed new set of landmarks.

FIELD OF THE INVENTION

The present invention is in the image analysis field. The inventionparticularly concerns performing face localization based on aconditional density propagation (CONDENSATION) framework.

BACKGROUND OF THE INVENTION

Face localization detects the locations of predefined detailed facialfeatures and outlines in images. It plays important roles in human facerelated applications. For example, after faces of different size, shape,pose and expression are aligned, face variations caused by differentfactors, such as human identity, facial expressions, illumination, etc.,can be extracted independently for face recognition, facial expressionanalysis, and face modeling and synthesis. Face localization is alsoemployed in visual face tracking and model based video coding, in whichthe face model needs to be aligned with the first video frame so thatfacial geometry and head pose can be customized. Face localization alsoplays important roles, for example, in computer vision applications forhuman-machine interaction. It provides two-dimensional (2D) facialgeometry information, which allows face recognition to align faces ofdifferent size, shape, pose and expression during training andevaluation stages, so that face variations caused by human identity ismodeled better and higher recognition rate can be achieved.

In recent years, some have proposed techniques to do face localizationautomatically. In other words, the locations of predefined facialfeatures and outlines are automatically detected and returned in animage in which the upright frontal view of a human face in arbitraryscene, under arbitrary illumination, and with typical facial expressionsis presented. In one known technique, facial features are extractedusing deformable template matching, which models facial features andoutlines as parametrized mathematical model (e.g., piecewiseparabolic/quadratic template) and tries to minimize some energy functionthat defines the fitness between the model and the facial outlines inthe image with respect to the model parameters. In another knowntechnique, shape statistic model is proposed which models the spatialarrangement of facial features statistically, and is used to localizethe facial features from a consternation of facial feature candidatescalculated using multi-orientation, multi-scale Gaussian derivativefilters.

SUMMARY OF THE INVENTION

The present invention is directed to a method for performing shapelocalization in an image. The method includes deriving a model shape,which is defined by a set of landmarks, from a database of a pluralityof sample shapes. A texture likelihood model of present sub-patches ofthe set of landmarks defining the model shape in the image is derived,and a new set of landmarks that approximates a true location of featuresof the shape based on a sample proposal model of the present sub-patchesat the set of landmarks, is then proposed. A CONDENSATION algorithm isused to derive the texture likelihood model and the proposed new set oflandmarks.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating face localization formulated in aBayesian framework;

FIG. 2 is a diagram illustrating face localization in a CONDENSATIONframework of the present invention;

FIG. 3 is a flowchart illustrating the process of performing facelocalization in accordance with one embodiment of the present invention;

FIG. 4 is an example of a face shape defined by a set of landmarks;

FIG. 5 is a diagram illustrating the manner in which a model face shapeis obtained from a database of sample face shapes; and

FIG. 6 is a diagram illustrating the hierarchical method of performingthe CONDENSATION algorithm in accordance with the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Generally, face localization can be formulated in a Bayesian frameworkas shown in FIG. 1. Given an image I and a predefined face model m, thegoal, i.e., the location of facial features, can be formulated as m*=argmax p(m|I)=arg max p(I|m) p(m), where p(m) is a prior probabilisticdistribution of a model, and p(I|m) is some local texture likelihooddistribution given a specific face model.

In the present invention, a hierarchical face localization algorithm isproposed based on a conditional density propagation (CONDENSATION)approach. The face outline, i.e., the a prior distribution for intrinsicmodel parameters, is modeled with Active Shape Model (ASM), with localtexture likelihood model (p(I|m)) at each landmark defining features ofa face outline modeled with Mixture of Gaussian. By formulating the facelocalization problem into a Maximum a posterior Probability (MAP)problem, a CONDENSATION framework is employed to solve this problem, asshown in FIG. 2. To improve the searching speed and robustness, ahierarchical approach is employed.

As the face localization problem is formulated as a MAP problem, theCONDENSATION algorithm, which is known to those skilled in the art,provides a tool to approximate the unknown distribution in highdimensional space based on a factored random-sampling approach. The ideaof factored sampling is that the a posterior probabilistic distributionor posterior p(m|I) can be modeled by a set of N samples {s^((n))} drawnfrom the a prior probabilistic distribution, or prior p(m) withcorresponding weight π^((n))=p(I|m=s^((n))) evaluated from the localtexture likelihood distribution p(I|m). The expectation of function h(X)with respect to the posterior p(m|I) can be approximated as

$\begin{matrix}{{E_{f}\left( {h(X)} \right)} = {\lim\limits_{N->\infty}\frac{\sum\limits_{k = 1}^{N}{{h\left( s^{(k)} \right)}\pi^{(k)}}}{\sum\limits_{k = 1}^{N}\pi^{(k)}}}} & (1)\end{matrix}$

However, this approach may not be practical as many samples drawn fromthe model prior p(m) might be wasted if corresponding π^((k)) is toosmall and does not make contribution to the computation. In oneembodiment of the invention, this problem is reformulated in aprobabilistic framework of CONDENSATION propagation so that all sampleshave significant observation probability, and thus sampling efficiencyis improved. Denoting m_(i) to be the state vector at iteration step i,and I_(i) to be the observation at iteration i,p(m _(i) |I _(i))=p(m _(i) |I _(i) , I _(i−1))˜p(I _(i) |m _(i))p(m _(i)|I _(i−1))is obtained.

Therefore, starting from the initial guess of N samples of models, a newset of random samples {m_(i) ^((k)),k=1, . . . , N} is drawn from theconditional a prior p(m_(i)|I_(i−1)), and weighted by their measurementsπ_(i) ^((k))=p(I_(i)|m=m_(i) ^((k))). This iterates until convergencecondition satisfies. Accordingly, to make CONDENSATION framework 12complete for the task of face localization, the a prior model p(m)representing the model face shape 14 or geometry, the local texturelikelihood model p(I_(i)/m_(i)) 16 representing the features of a faceshape such as the eyes, nose, mouth, etc., and a conditional a priormodel p(m_(i)/I_(i−1)) representing the sample proposal model, arerequired (see FIG. 2).

Turning now to FIG. 3 and in one embodiment of the invention, the activeshape model (ASM) is used to describe a two-dimensional (2D) human facegeometry, i.e. the shape model p(m) 14 (block 18). The landmarks of theshape are represented as a vector S=(x₁, x₂, . . . , x_(K), y₁, y₂, . .. y_(K))^(T) of length 2K, where K is the number of manually labeledlandmarks defining a face, for example, 87 marks as in FIG. 4. Given aset of manually labeled sample face shapes 26 in a database 28, (bestshown in FIG. 5) the labeled face shapes are aligned to the same scaleand orientation and normalized using Procrustes analysis (PCA), forexample. PCA is applied to the face vectors, and the eigenspace of theface variations is defined by the eigenvectors.

By taking the first k principal components, (e.g., k=15 to preserve 85%variations), a face shape can be modeled asS= S+Uw,   (2)where S is the mean shape of the face, and U_(2K×k) is the eigenvectormatrix, and w_(k×l) is the parameter vector that define the face shapemodel 14. The a prior model probability p(m) can be obtained by learninga mixture of Gaussian model after projecting the face vectors in the kdimensional ASM eigenspace.

The shape vector S can also be rearranged into another form as

${\hat{S} = \left\{ {\begin{pmatrix}x_{1} \\y_{1}\end{pmatrix},\begin{pmatrix}x_{2} \\y_{2}\end{pmatrix},\ldots\mspace{11mu},\begin{pmatrix}{x_{\kappa},} \\y_{\kappa}\end{pmatrix}} \right\}},$where {circumflex over (()}{circumflex over (˜)} denotes therearrangement operation of shape vector. As the face in image may besubject to scaling, rotation and translation, the relation can bedenoted as

$\begin{matrix}{{{\hat{S}}_{image} = {{{s\begin{bmatrix}{\cos(\theta)} & {\sin(\theta)} \\{- {\sin(\theta)}} & {\cos(\theta)}\end{bmatrix}}\hat{S}} + T}},} & (3)\end{matrix}$where s is scaling factor, θ is the angle of rotation, and

$T = \begin{bmatrix}{T_{x}} \\T_{y}\end{bmatrix}$is the translation of the face in the image. Thus, the landmark set of aface in image can be represented as a compact parameter model m=(s, θ,T, w). The goal of face localization thus becomes to recover the modelparameter m given a face image.

Given a sample in the model parameter space m=m_(i) at iteration i, theshape vector of the landmark set in image can be retrieved by inversetransformation of equations (2) and (3) (block 20). A sub-patch of eachlandmark (i.e., a small area surrounding each landmark) in the image isthen cropped or cut to a specified size. Letting Γ_(j) denote thesub-patch of landmark j, then the local texture likelihood model isdefined as

${{p\left( {I❘m} \right)} = {{p\left( {\Gamma_{1},\Gamma_{2},\ldots\mspace{11mu},\Gamma_{K}} \right)} = {\prod\limits_{j = 1}^{K}{p\left( \Gamma_{j} \right)}}}},$supposing the texture of each landmark is independent. To learn thetexture likelihood p(θ_(j)) of landmark i from training images, i.e.,the sample face shapes 26 from the database 28, the sub-patch oflandmark i in the training images is collected, and projected into lowdimensional texture eigenspace. Mixture of Gaussian model is learnedfrom these sub-patch projections to represent the distribution.

The sample proposal model p(m_(i)|I_(i−1)) enables the samples {m_(i)}in the model parameter space to migrate toward regions of higherlikelihood distribution according to their evaluation of the localobservation of facial features in image (block 22). The collection oflocal observation of facial features image at iteration i can berepresented as I_(i)={Γ₁ ^((i)),Γ₂ ^((i)), . . . ,Γ_(K) ^((i))}. Byregarding the shape model as landmark set {p₁, p₂, . . . , p_(K)} andthe proposal model for landmark j can be represented as p(p_(j)^((i))|Γ_(j) ^((i))), then

${p\left( {m_{i}❘I_{i - 1}} \right)} = {{p\left( {p_{1}^{(i)},p_{2}^{(i)},\ldots\mspace{11mu},{p_{\kappa}^{(i)}❘\Gamma_{1}^{({i - 1})}},\Gamma_{2}^{({i - 1})},\ldots\mspace{11mu},\Gamma_{K}^{({i - 1})}} \right)}\mspace{56mu} = {\prod\limits_{j = 1}^{K}{p\left( {p_{j}^{(i)}❘\Gamma_{j}^{({i - 1})}} \right)}}}$is obtained by assuming independence of the proposal model of eachlandmark.

The proposal model of each landmark is formulated as

${{p\left( {p_{j}❘\Gamma_{j}} \right)} = \frac{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}{\sum\limits_{{({x,y})} \in \Gamma_{j}}{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}}},$where Γ_((x, y)) means a subpatch centered at (x, y).

According to Bayesian rule,p(p _(j)=(x, y)|Γ_((x, y)))˜p(Γ_((x, y)) |p _(j)=(x, y))p(p _(j)=(x,y))=p(Γ_((x, y)j))p(p _(j)=(x, y)),where p(Γ_((x, y)j)) is the texture likelihood of landmark j at location(x, y), and p(p_(j)=(x, y)) can be simply modeled as a uniformdistribution in the image.

After the new model sample is proposed as {p₁ ^((i)), p₂ ^((i)), . . . ,p_(K) ^((i))}, the derivative is represented as

$\quad\begin{matrix}{{\Delta\; S^{(i)}} = \left( {{\Delta\; x_{1}^{(i)}},{\Delta\; x_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; x_{\kappa}^{(i)}},{\Delta\; y_{1}^{(i)}},{\Delta\; y_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; y_{\kappa}^{(i)}}} \right)^{T}} \\{= {\left( {x_{1}^{(i)},x_{2}^{(i)},\ldots\mspace{11mu},x_{\kappa}^{(i)},y_{1}^{(i)},y_{2}^{(i)},\ldots\mspace{11mu},y_{\kappa}^{(i)}} \right)^{T} -}} \\{\left( {x_{1}^{({i - 1})},x_{2}^{({i - 1})},\ldots\mspace{11mu},x_{\kappa}^{({i - 1})},y_{1}^{({i - 1})},y_{2}^{({i - 1})},\ldots\mspace{11mu},y_{\kappa}^{({i - 1})}} \right)^{T}} \\{S = {\overset{\_}{S} + {Uw}}} \\{= {\left( {\overset{\_}{S}\mspace{14mu} U} \right)\begin{pmatrix}1 \\w\end{pmatrix}}}\end{matrix}$to convert from a landmark space to a model parameter space (block 24).

By supposing the rotation angle is very small, the followingapproximation is obtained

$\begin{pmatrix}{Xi} \\{Yi}\end{pmatrix} = {{{{s\begin{pmatrix}{\cos(\theta)} & {\sin(\theta)} \\{- {\sin(\theta)}} & {\cos(\theta)}\end{pmatrix}}\begin{pmatrix}{\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\{\overset{\_}{S}}_{i}^{y} & U_{i}^{y}\end{pmatrix}\begin{pmatrix}1 \\w\end{pmatrix}} + \begin{pmatrix}T_{x} \\T_{y}\end{pmatrix}}\mspace{59mu} = {{{\begin{pmatrix}{\cos(\theta)} & {\sin(\theta)} \\{- {\sin(\theta)}} & {\cos(\theta)}\end{pmatrix}\begin{pmatrix}{\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\{\overset{\_}{S}}_{i}^{y} & U_{i}^{y}\end{pmatrix}\begin{pmatrix}s \\{sw}\end{pmatrix}} + \begin{pmatrix}T_{x} \\T_{y}\end{pmatrix}}\mspace{59mu} \approx {{\begin{pmatrix}1 & \theta \\{- \theta} & 1\end{pmatrix}\begin{pmatrix}{\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\{\overset{\_}{S}}_{i}^{y} & U_{i}^{y}\end{pmatrix}w^{\prime}} + \begin{pmatrix}T_{x} \\T_{y}\end{pmatrix}}}}$By taking derivative of X_(i), Y_(i) with respect to θ, T, and w′, wehave the following equation

$\begin{matrix}{\begin{pmatrix}{dX}_{i} \\{dY}_{i}\end{pmatrix} = {\left\lbrack {\begin{pmatrix}{\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \\{\overset{\_}{S}}_{i}^{x} & {- U_{i}^{x}}\end{pmatrix}w^{\prime}\begin{matrix}\; & 1 & 0 & \; \\\vdots & \; & \; & \vdots \\\; & 0 & 1 & \;\end{matrix}\begin{pmatrix}1 & \theta \\{- \theta} & 1\end{pmatrix}\begin{pmatrix}{\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\{\overset{\_}{S}}_{i}^{y} & U_{i}^{y}\end{pmatrix}} \right\rbrack{\begin{pmatrix}{d\;\theta} \\\ldots \\{dT} \\\ldots \\{dw}^{\prime}\end{pmatrix}.}}} & (4)\end{matrix}$The above equation (4) enables ΔS^((i)) to be converted into derivatesin parameter space Δm^((i))=(Δs^((i)),Δθ^((i)),ΔT^((i)),Δw^((i))), andm^((i+1))=m^((i))+aΔm^((i)) for some 0<a<=1.

Turning now to FIG. 6 and in accordance with an exemplary embodiment ofthe invention, a face in an image is searched hierarchically, i.e., in acoarse-to-fine manner. First, the image is down-sampled, i.e., the sizeof the image is reduces into, for example, a 3-layer-pyramid. TheCONDENSATION algorithm as described above starts from the image atlowest resolution, and gradually refines the search in image at higherresolution. Second, the number of landmarks used in sample proposalmodel p(m_(i)|I_(i−1)) increases as the resolution of image increases.For example, for a face defined by 87 landmarks, the system can startwith 10 landmarks (corresponding to strong facial features that areperceptible at lowest resolution) at lowest resolution 30, and increaseto 60 landmarks for intermediate level 32, and all 87 landmarks for thefinest level 34. Third, the dimension of shape eigen-space alsoincreases when the resolution of image increases. At the lowestresolution 30, the dimension of eigen-space might only be 1, forexample. It then increases to, for example, 7 at the intermediate level32, and finally reaches 15 at the finest resolution 34. This designlargely improves the computation efficiency, and prevents the searchfrom local minima.

While specific embodiments of the present invention have been shown anddescribed, it should be understood that other modifications,substitutions and alternatives are apparent to one of ordinary skill inthe art. Such modifications, substitutions and alternatives can be madewithout departing from the spirit and scope of the invention, whichshould be determined from the appended claims.

1. A method for performing face shape localization in an image, comprising: deriving a model face shape from a database of a plurality of sample face shapes, said model face shape being defined by a set of landmarks; deriving a texture likelihood model of present sub-patches of said set of landmarks defining said model face shape in the image; and proposing a new set of landmarks that approximates a true location of features of the face shape based on a sample proposal model of said present sub-patches; wherein said deriving said texture likelihood model and said proposing said new set of landmarks are conducted using a CONDENSATION algorithm; and said model face shape is derived from a prior probabilistic distribution of a predefined model p(m), said texture likelihood model of said present sub-patches is derived from a local texture likelihood distribution model p(I|m), and said sample proposal model is derived based on a texture likelihood model of subsequent sub-patches of a set of landmarks in the image at proposed locations in a vicinity of said present sub-patches of said present set of landmarks.
 2. The method as defined in claim 1, wherein the a prior probabilistic distribution of a predefined model p(m) describes a 2-dimensional shape represented as a vector, S=(x₁, x₂, . . . , x_(K), y₁, y₂, . . . , y_(K))^(T) of length 2K, where K is a number of landmarks that define said face shape.
 3. The method as defined in claim 2, wherein said face shape is modeled as a vector S= S+Uw, when aligned with at least one of a plurality of manually labeled shape images by taking a first k principal components, where S is a mean shape, and U_(2K×k) is an eigenvector matrix, and w_(k×1) is a parameter vector that define said shape S.
 4. The method as defined in claim 3, wherein the a prior probabilistic distribution of a predefined model p(m) is obtained by learning a mixture of Gaussian model after projecting said shape vector S in the k dimensional active shape model (ASM) eigenspace.
 5. The method as detined in claim 3 wherein, said shape vector S is rearranged as ${\hat{S} = \left\{ {\begin{pmatrix} x_{1} \\ y_{1} \end{pmatrix},\begin{pmatrix} x_{2} \\ y_{2} \end{pmatrix},\ldots\mspace{11mu},\begin{pmatrix} {x_{\kappa},} \\ y_{\kappa} \end{pmatrix}} \right\}},$ where ({circumflex over (·)}) denotes the rearrangement operation of elements in the shape vector.
 6. The method as defined in claim 5 wherein, said shape vector S is denoted as ${{\hat{S}}_{image} = {{{s\begin{bmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{bmatrix}}\hat{S}} + T}},$ where S is a scaling factor, θ is an angle of rotation, and $T = \begin{bmatrix} {T_{x}} \\ T_{y} \end{bmatrix}$ is the translation in the image, wherein the landmarks of the face shape in image is represented as parameter model m=(s, θ, T, w).
 7. The method as defined in claim 1, wherein the local texture likelihood distribution model p(I|m) is defined as ${{p\left( {I❘m} \right)} = {{p\left( {\Gamma_{1},\Gamma_{2},\ldots\mspace{11mu},\Gamma_{K}} \right)} = {\prod\limits_{j = 1}^{K}{p\left( \Gamma_{j} \right)}}}},$ supposing the texture of each landmark is independent, and where Γ_(j) denotes a sub-patch of landmark j defining a two-dimensional shape; wherein a texture likelihood p(Γ_(j)) of landmark i is independently learned as a Mixture of Gaussian model of sub-patches of each landmark cropped from a plurality of manually labeled training images in the database projected into their customized feature subspaces.
 8. The method as defined in claim 7, wherein I₁={Γ₁ ^((i)), Γ₂ ^((i)), . . . , Γ_(K) ^((i))} is a collection of local observation of shape features in image at interation i, and ${{p\left( {m_{i}❘I_{i - 1}} \right)} = {{p\left( {p_{1}^{(i)},p_{2}^{(i)},\ldots\mspace{11mu},{p_{\kappa}^{(i)}❘\Gamma_{1}^{({i - 1})}},\Gamma_{2}^{({i - 1})},\ldots\mspace{11mu},\Gamma_{K}^{({i - 1})}} \right)}\mspace{56mu} = {\prod\limits_{j = 1}^{K}{p\left( {p_{j}^{(i)}❘\Gamma_{j}^{({i - 1})}} \right)}}}},$ by regarding a predetined model {p₁, p₂, . . . , p_(K)} as a landmark set, and assuming independence of a p(m/I_(i−1)) of each of said landmark p(p_(j) ^((i))|Γ_(j) ^((i))); said p(m/I_(i−1)) of each landmark is formulated as ${{p\left( {p_{j}❘\Gamma_{j}} \right)} = \frac{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}{\sum\limits_{{({x,y})} \in \Gamma_{j}}{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}}},$ where Γ_((x, y)) means a subpatch centered at (x, y); and p(p _(j)=(x, y)|Γ_((x, y)))˜p(Γ_((x, y)) |p _(j)=(x, y))p(p _(j)=(x, y))=p(Γ_((x, y)j))p(p _(j)=(x, y)), where p(Γ_((x, y)j)) is a texture likelihood of landmark j at location (x, y), and p(p_(j)=(x, y)) is modeled as a uniform distribution in the image.
 9. The method as defined in claim 8, wherein the formula ${p\left( {p_{j}❘\Gamma_{j}} \right)} = \frac{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}{\sum\limits_{{({x,y})} \in \Gamma_{j}}{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}}$ for the proposal model of each landmark is converted to model parameter space expressed by an equation, Δm ^((i))=(Δs ^((i)), Δθ^((i)) , ΔT ^((i)) , Δw ^((i))), and m ^((i+1)) =m ^((i)) +aΔm ^((i)) for some 0<a<=1.
 10. The method as defined in claim 9, wherein said model parameter space equation is obtained from a new model sample proposed as {p₁ ^((i)), p₂ ^((i)), . . . , p_(k) ^((i))}, $\quad\begin{matrix} {{\Delta\; S^{(i)}} = \left( {{\Delta\; x_{1}^{(i)}},{\Delta\; x_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; x_{\kappa}^{(i)}},{\Delta\; y_{1}^{(i)}},{\Delta\; y_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; y_{\kappa}^{(i)}}} \right)^{T}} \\ {= {\left( {x_{1}^{(i)},x_{2}^{(i)},\ldots\mspace{11mu},x_{\kappa}^{(i)},y_{1}^{(i)},y_{2}^{(i)},\ldots\mspace{11mu},y_{\kappa}^{(i)}} \right)^{T} -}} \\ {\left( {x_{1}^{({i - 1})},x_{2}^{({i - 1})},\ldots\mspace{11mu},x_{\kappa}^{({i - 1})},y_{1}^{({i - 1})},y_{2}^{({i - 1})},\ldots\mspace{11mu},y_{\kappa}^{({i - 1})}} \right)^{T}} \\ {S = {\overset{\_}{S} + {Uw}}} \\ {{= {\left( {\overset{\_}{S}\mspace{14mu} U} \right)\begin{pmatrix} 1 \\ w \end{pmatrix}}};} \end{matrix}$ by supposing the rotation angle is very small, ${\begin{pmatrix} {Xi} \\ {Yi} \end{pmatrix} = {{{{s\begin{pmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{pmatrix}}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}\begin{pmatrix} 1 \\ w \end{pmatrix}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}\mspace{59mu} = {{{\begin{pmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}\begin{pmatrix} s \\ {sw} \end{pmatrix}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}\mspace{59mu} \approx {{\begin{pmatrix} 1 & \theta \\ {- \theta} & 1 \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}w^{\prime}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}}}};\mspace{11mu}{and}$ by taking derivative of X_(i), Y_(i) with respect to θ, T, and w′, $\begin{pmatrix} {dX}_{i} \\ {dY}_{i} \end{pmatrix} = {\left\lbrack {\begin{pmatrix} {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \\ {\overset{\_}{S}}_{i}^{x} & {- U_{i}^{x}} \end{pmatrix}w^{\prime}\begin{matrix} \; & 1 & 0 & \; \\ \vdots & \; & \; & \vdots \\ \; & 0 & 1 & \; \end{matrix}\begin{pmatrix} 1 & \theta \\ {- \theta} & 1 \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}} \right\rbrack{\begin{pmatrix} {d\;\theta} \\ \ldots \\ {dT} \\ \ldots \\ {dw}^{\prime} \end{pmatrix}.}}$
 11. The method as defined in claim 1, wherein the CONDENSATION algorithm is performed separately on at least one reduced-size of the image prior to performing the CONDENSAILON algorithm on a full-size of the image.
 12. The method as defined in claim 11, wherein the CONDENSATION algorithm is performed separately tbr a plurality of image resolutions starting from a low image resolution to a high image resolution.
 13. The method as defined in claim 12, wherein the number of landmarks defining said model face shape is increased from the low image resolution to the high image resolution.
 14. The method as defined in claim 13, wherein dimension of shape eigen-space arc also increased from the low image resolution to the high image resolution.
 15. method as defined in claim 11, wherein the CONDENSATION algorithm is performed hierarchically on a plurality of image resolutions starting from a low image resolution to a high image resolution.
 16. Method for performing a face localization in an image based on a Bayesian rule, comprising: deriving a predefined face shape model m; employing conditional density propagation (CONDENSATION) algorithm to locate a face shape in the image using a prior probabilistic distribution of a model p(m) based on said predefined face shape model m, and a local texture likelihood distribution given said predefined face shape model with specific model parameters p(I|m).
 17. The method as defined in claim 16, wherein the CONDENSATION algorithm is performed separately on at least one reduced-size of the image prior to performing the CONDENSATION algorithm on a full-size of the image.
 18. The method as defined in claim 16, wherein the a prior probabilistic distribution of a predefined model p(m) describes a 2-dimensional face shape represented as a vector, S=(x₁, x₂, . . . , x_(K), y₁, y₂, . . . , y_(K))^(T) of length 2K, where K is a number of landmarks that define said face shape.
 19. The method as defined in claim 18, wherein the face shape is modeled as a vector S= S+Uw, when aligned with at least one of a plurality of manually labeled face images by taking a first k principal components, where S is a mean face shape, and U_(2K×k) is an eigenvector matrix, and w_(k×1) is a parameter vector that define said face shape S.
 20. The method as defined in claim 19, wherein the a prior probabilistic distribution of a predefined model p(m) is obtained by learning a mixture of Gaussian model after projecting said face vector S in the k dimensional active shape model (ASM) eigenspace.
 21. The method as defined in claim 20 wherein, said face shape vector S is rearranged as ${\hat{S} = \left\{ {\begin{pmatrix} x_{1} \\ y_{1} \end{pmatrix},\begin{pmatrix} x_{2} \\ y_{2} \end{pmatrix},\ldots\mspace{11mu},\begin{pmatrix} x_{\kappa} \\ y_{\kappa} \end{pmatrix}} \right\}},$ where ({circumflex over (·)}) denotes the rearrangement operation of shape vector.
 22. The method as defined in claim 21 wherein, said shape vector S is denoted as ${{\hat{S}}_{image} = {{{s\begin{bmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{bmatrix}}\hat{S}} + T}},$ where s is a scaling factor, θ is an angle of rotation, and $T = \begin{bmatrix} {T_{x}} \\ T_{y} \end{bmatrix}$ is the translation in the image, wherein the landmarks of the face in image is represented as parameter model m=(s, 0, T, w).
 23. The method as defined in claim 16, wherein the local texture likelihood distribution model p(I|m) is defined as ${{p\left( {I❘m} \right)} = {{p\left( {\Gamma_{1},\Gamma_{2},\ldots\mspace{11mu},\Gamma_{K}} \right)} = {\prod\limits_{j = 1}^{K}{p\left( \Gamma_{j} \right)}}}},$ supposing the texture of each landmark is independent, and where Γ_(j) denotes a sub-patch of landmark j defining a two-dimensional face shape; wherein a texture likelihood p(Γ_(j)) of landmark i is independently learned as a Mixture of Gaussian model of sub-patches of each landmark cropped from a plurality of manually labeled training images in the database projected into their customized feature subspaces.
 24. The method as defined in claim 23, wherein I_(i)={Γ₁ ^((i)), Γ₂ ^((i)), . . . , Γ_(K) ^((i))} is a collection of local observation of facial features in image at interation i, and ${{p\left( {m_{i}❘I_{i - 1}} \right)} = {{p\left( {p_{1}^{(i)},p_{2}^{(i)},\ldots\mspace{11mu},{p_{\kappa}^{(i)}❘\Gamma_{1}^{({i - 1})}},\Gamma_{2}^{({i - 1})},\ldots\mspace{11mu},\Gamma_{K}^{({i - 1})}} \right)}\mspace{56mu} = {\prod\limits_{j = 1}^{K}{p\left( {p_{j}^{(i)}❘\Gamma_{j}^{({i - 1})}} \right)}}}},$ by regarding a predefined model {p₁, p₂, . . . , p_(K)} as a landmark set, and assuming independence of a p(m_(i)/I_(i−1)) of each of said landmark p(p_(j) ^((i))|Γ_(j) ^((i))); said p(m_(i)/I_(i−1)) of each landmark is formulated as ${{p\left( {p_{j}❘\Gamma_{j}} \right)} = \frac{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}{\sum\limits_{{({x,y})} \in \Gamma_{j}}{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}}},$ where Γ_((x, y)) means a subpatch centered at (x, y); and p(p _(j)=(x, y)|Γ_((x, y)))˜p(Γ_((x, y)|) p _(j)=(x, y))p(p _(j)=(x, y))=p(Γ_((x, y)j))p(p _(j)=(x, y)) where p(Γ_((x, y)j)) is a texture likelihood of landmark j at location (x, y), and p(p_(j)=(x, y)) is modeled as a uniform distribution in the image.
 25. The method as defined in claim 24, wherein the formula ${p\left( {p_{j}❘\Gamma_{j}} \right)} = \frac{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}{\sum\limits_{{({x,y})} \in \Gamma_{j}}{p\left( {p_{j} = {\left( {x,y} \right)❘\Gamma_{({x,y})}}} \right)}}$ for the proposal model of each landmark is converted to model parameter space expressed by an equation, Δm ^((i))=(Δs ^((i)), Δθ^((i)) ,ΔT ^((i)) , Δw ^((i))), and m ^((i+1)) 32 m ^((i)) +aΔm ^((i)) for some 0<a<=1.
 26. The method as defined in claim 25, wherein said model parameter space equation is obtained from a new model sample proposed as {p₁ ^((i)), p₂ ^((i)), . . . , p_(k) ^((i))}, $\quad\begin{matrix} {{\Delta\; S^{(i)}} = \left( {{\Delta\; x_{1}^{(i)}},{\Delta\; x_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; x_{\kappa}^{(i)}},{\Delta\; y_{1}^{(i)}},{\Delta\; y_{2}^{(i)}},\ldots\mspace{11mu},{\Delta\; y_{\kappa}^{(i)}}} \right)^{T}} \\ {= {\left( {x_{1}^{(i)},x_{2}^{(i)},\ldots\mspace{11mu},x_{\kappa}^{(i)},y_{1}^{(i)},y_{2}^{(i)},\ldots\mspace{11mu},y_{\kappa}^{(i)}} \right)^{T} -}} \\ {\left( {x_{1}^{({i - 1})},x_{2}^{({i - 1})},\ldots\mspace{11mu},x_{\kappa}^{({i - 1})},y_{1}^{({i - 1})},y_{2}^{({i - 1})},\ldots\mspace{11mu},y_{\kappa}^{({i - 1})}} \right)^{T}} \\ {S = {\overset{\_}{S} + {Uw}}} \\ {{= {\left( {\overset{\_}{S}\mspace{14mu} U} \right)\begin{pmatrix} 1 \\ w \end{pmatrix}}};} \end{matrix}$ by supposing the rotation angle is very small, ${\begin{pmatrix} {Xi} \\ {Yi} \end{pmatrix} = {{{{s\begin{pmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{pmatrix}}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}\begin{pmatrix} 1 \\ w \end{pmatrix}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}\mspace{59mu} = {{{\begin{pmatrix} {\cos(\theta)} & {\sin(\theta)} \\ {- {\sin(\theta)}} & {\cos(\theta)} \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}\begin{pmatrix} s \\ {sw} \end{pmatrix}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}\mspace{59mu} \approx {{\begin{pmatrix} 1 & \theta \\ {- \theta} & 1 \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}w^{\prime}} + \begin{pmatrix} T_{x} \\ T_{y} \end{pmatrix}}}}};\mspace{14mu}{and}$ by taking derivative of X_(i), Y_(i) with respect to θ, T, and w′, $\begin{pmatrix} {dX}_{i} \\ {dY}_{i} \end{pmatrix} = {\left\lbrack {\begin{pmatrix} {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \\ {\overset{\_}{S}}_{i}^{x} & {- U_{i}^{x}} \end{pmatrix}w^{\prime}\begin{matrix} \; & 1 & 0 & \; \\ \vdots & \; & \; & \vdots \\ \; & 0 & 1 & \; \end{matrix}\begin{pmatrix} 1 & \theta \\ {- \theta} & 1 \end{pmatrix}\begin{pmatrix} {\overset{\_}{S}}_{i}^{x} & U_{i}^{x} \\ {\overset{\_}{S}}_{i}^{y} & U_{i}^{y} \end{pmatrix}} \right\rbrack{\begin{pmatrix} {d\;\theta} \\ \ldots \\ {dT} \\ \ldots \\ {dw}^{\prime} \end{pmatrix}.}}$ 